Optimizing Kernel Block Memory Operations

نویسندگان

Michael Calhoun

Scott Rixner

Alan L. Cox

چکیده

This paper investigates the performance of block memory operations in the operating system, including memory copies, page zeroing, interprocess communication, and networking. The performance of these common operating system operations is highly dependent on the cache state and future use pattern of the data, and no single routine maximizes performance in all situations. Current systems use a statically selected algorithm to perform block memory operations. This paper introduces a method to dynamically predict the optimal algorithm for each block memory operation. This dynamic selection involves the prediction of the current state of the cache as well as whether or not the target data will be reused before it is evicted. By using the results of these predictions to select the optimal software algorithm, the performance of kernel copy operations can be improved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

Sparse matrix-vector multiplication is an important computational kernel that tends to perform poorly on modern processors, largely because of its high ratio of memory operations to arithmetic operations. Optimizing this algorithm is difficult, both because of the complexity of memory systems and because the performance is highly dependent on the nonzero structure of the matrix. The Sparsity sy...

متن کامل

Characterization of Block Memory Operations

Block memory operations are frequently performed by the operating system and consume an increasing fraction of kernel execution time. These operations include memory copies, page zeroing, interprocess communication, and networking. This thesis demonstrates that performance of these common OS operations is highly dependent on the cache state and future use pattern of the data. This thesis argues...

متن کامل

Optimizing Sparse Matrix-vector Multiplication Based on Gpu

In recent years, Graphics Processing Units(GPUs) have attracted the attention of many application developers as powerful massively parallel system. Computer Unified Device Architecture (CUDA) as a general purpose parallel computing architecture makes GPUs an appealing choice to solve many complex computational problems in a more efficient way. Sparse Matrix-vector Multiplication(SpMV) algorithm...

متن کامل

A Parallel Computational Kernel for Sparse Nonsymmetric Eigenvalue Problems on Multicomputers

The aim of this paper is to show an effective reorganization of the nonsymmetric block lanczos algorithm efficient, portable and scalable for multiple instructions multiple data (MIMD) distributed memory message passing architectures. Basic operations implemented here are matrix-matrix multiplications, eventually with a transposed and a sparse factor, LU factorisation and triangular systems sol...

متن کامل

Optimizing Performance on Modern HPC Systems: Learning From Simple Kernel Benchmarks

We discuss basic optimization and parallelization strategies for current cache-based microprocessors (Intel Itanium2, Intel Netburst and AMD64 variants) in single-CPU and shared memory environments. Using selected kernel benchmarks representing data intensive applications we focus on the effective bandwidths attainable, which is still suboptimal using current compilers. We stress the need for a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Optimizing Kernel Block Memory Operations

نویسندگان

چکیده

منابع مشابه

Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

Characterization of Block Memory Operations

Optimizing Sparse Matrix-vector Multiplication Based on Gpu

A Parallel Computational Kernel for Sparse Nonsymmetric Eigenvalue Problems on Multicomputers

Optimizing Performance on Modern HPC Systems: Learning From Simple Kernel Benchmarks

عنوان ژورنال:

اشتراک گذاری